Representing Discourse Coherence: A Corpus-Based Study

نویسندگان

  • Florian Wolf
  • Edward Gibson
چکیده

This article aims to present a set of discourse structure relations that are easy to code and to develop criteria for an appropriate data structure for representing these relations. Discourse structure here refers to informational relations that hold between sentences in a discourse. The set of discourse relations introduced here is based on Hobbs (1985). We present a method for annotating discourse coherence structures that we used to manually annotate a database of 135 texts from the Wall Street Journal and the AP Newswire. All texts were independently annotated by two annotators. Kappa values of greater than 0.8 indicated good interannotator agreement. We furthermore present evidence that trees are not a descriptively adequate data structure for representing discourse structure: In coherence structures of naturally occurring texts, we found many different kinds of crossed dependencies, as well as many nodes with multiple parents. The claims are supported by statistical results from our hand-annotated database of 135 texts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Representing discourse coherence: A corpus-based analysis

We present a set of discourse structure relations that are easy to code, and develop criteria for an appropriate data structure for representing these relations. Discourse structure here refers to informational relations that hold between sentences in a discourse (cf. Hobbs, 1985). We evaluated whether trees are a descriptively adequate data structure for representing coherence. Trees are widel...

متن کامل

Discourse and Coherence: Revisiting Specific Conventions of the Centering Theory

This paper discusses a corpus-based study whose aim is to evaluate specific conventions of the centering theory and to establish whether they should be revisited. In particular, the study explores the relation between discourse coherence and several parameters such as the definition of an utterance, the varieties of anaphora considered, the forms of the discourse entities and the type of genre.

متن کامل

A Corpus-based Study of Lexical Bundles in Discussion Section of Medical Research Articles

There has been increasing interest in utilizing corpora in linguistic research and pedagogy in recent years. Rhetorical organization of different sections of research articles may appear similar in various disciplines, but close examination may show subtle differences nonetheless. One of the features that has been at the center of attention especially in recent years is the idiomaticity of a di...

متن کامل

Modeling Discourse Coherence for the Automated Scoring of Spontaneous Spoken Responses

This study describes an approach for modeling the discourse coherence of spontaneous spoken responses in the context of automated assessment of non-native speech. Although the measurement of discourse coherence is typically a key metric in human scoring rubrics for assessments of spontaneous spoken language, little prior research has been done to assess a speaker’s coherence in the context of a...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2005